Random centroid initialization for improving centroid-based clustering

نویسندگان

چکیده

A method for improving centroid-based clustering is suggested. The improvement built on diversification of the k-means++ initialization. algorithm claimed to be a better version k-means tested by computational set-up, where dataset size, number features, and clusters are varied. statistics obtained testing have shown that, in roughly 50 % instances cluster, outputs worse results than with random centroid impact initialization solidifies as both size features increase. In order reduce possible underperformance k-means++, run separate processor core parallel running algorithm, whereupon result selected. runs set not less that k-means. By incorporating seeding initialization, gains about 0.05 accuracy every second instance cluster.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pseudo-centroid clustering

Pseudo-Centroid Clustering replaces the traditional concept of a centroid expressed as a center of gravity with the notion of a pseudo-centroid (or a coordinate free centroid) which has the advantage of applying to clustering problems where points do not have numerical coordinates (or categorical coordinates that are translated into numerical form). Such problems, for which classical centroids ...

متن کامل

Novel centroid selection approaches for KMeans-clustering based recommender systems

Recommender systems have the ability to filter unseen information for predicting whether a particular user would prefer a given item when making a choice. Over the years, this process has been dependent on robust applications of data mining and machine learning techniques, which are known to have scalability issues when being applied for recommender systems. In this paper, we propose a k-means ...

متن کامل

Density-Based Centroid Approximation for Initializing Iterative Clustering Algorithms

We present KDI (Kernel Density Initialization), a density-based procedure for approximating centroids for the initialization step of iteration-based clustering algorithms. We show empirically that a rather low number of distance calculations in conjunction with a fast algorithm for nding the highest peaks are suucient for eeectively and eeciently nding a pre-speciied number of good centroids, w...

متن کامل

Using Class Frequency for Improving Centroid-based Text Classification

Most previous works on text classification, represented importance of terms by term occurrence frequency (tf) and inverse document frequency (idf). This paper presents the ways to apply class frequency in centroid-based text categorization. Three approaches are taken into account. The first one is to explore the effectiveness of inverse class frequency on the popular term weighting, i.e., TFIDF...

متن کامل

Improving Centroid-based Text Classification Using Term-distribution-based Weighting System and Clustering

Centroid-based text classification is one of the most popular supervised approaches to classify texts into a set of pre-defined classes with relatively low computation. Based on the vector-space model, the performance of this classification particularly depends on the way to weigh terms in documents in order to construct a representative class vector for each class and degree of spherical shape...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Decision Making

سال: 2023

ISSN: ['2560-6018', '2620-0104']

DOI: https://doi.org/10.31181/dmame622023742